Recurrent Event Modelling with SurPyval
=======================================

This section is aims to show how you can use SurPyval to model counting
processes.

Recurrent Event SurPyval Modelling
----------------------------------

First, we will start with recurrent events, and a simple non-parametric model.

Non-Parametric Counting Model with Surpyval
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Creating a Non-Parametric MCF with surpyval is easy. Simply collect the data and
pass it to the ``fit`` call of the ``NonParametricCounting`` class.


.. jupyter-execute::

    from surpyval.recurrent import NonParametricCounting
    import numpy as np

    x = [1, 2, 3, 4, 5, 6, 7]

    model = NonParametricCounting.fit(x)
    model.plot()

This shows the expected number of events at any time. The model is a step
function since it is parametric and we have made no assumptions about the
count between observed events.

The result of this is a Non-Parametric Counting model that can be used just like
all other models in surpyval. It is important to note that the ``fit`` function
takes the values of x as the *cumulative* time to the event, not the inter-arrival
time. If you do have inter-arrival data (which is sorted in the correct order)
all you need do is take the cumulative sum of the obervations along the length
of the array. For example:

.. jupyter-execute::

    from surpyval.recurrent import NonParametricCounting
    import numpy as np

    interarrival_times = [1, 1, 2, 4, 3, 1, 2, 1]
    x = np.cumsum(interarrival_times)

    model = NonParametricCounting.fit(x)

We can then use this model to estimate the number of failures at any time. For
example, let's say we wanted to know how many failures we would expect to see
after 10 units of time. We can do this by using the ``mcf`` method of the model.

.. jupyter-execute::

    from surpyval.recurrent import NonParametricCounting
    import numpy as np

    interarrival_times = [1, 1, 2, 4, 3, 1, 2, 1]
    x = np.cumsum(interarrival_times)

    model = NonParametricCounting.fit(x)
    model.mcf(10)


The above two examples use only one item, but we can get the expected number
of events based on data from any number of items. Let's say we had three items
observed until the last event. Let's do some modelling.

.. jupyter-execute::

    from surpyval.recurrent import NonParametricCounting
    import numpy as np

    x = [1, 2, 3, 4, 5, 6, 7, 1, 4, 6, 9, 2, 7, 8, 9]
    i = [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]

    model = NonParametricCounting.fit(x, i=i)
    model.plot()

Here we have the expected number of events over time based on the observations
of three different items.

These functions work with censoring as well. We need to keep in mind that the
only right censored points we can have for an item is the last. This is because
it doesn't make any sense to have a right censored point followed by another
event. The same is true for left censored and truncated data. Therefore the
"timeline" for a single item must be coherent for the model to work.

Let's look at how we can use right censoring.

.. jupyter-execute::

    from surpyval.recurrent import NonParametricCounting
    import numpy as np

    x = [1, 2, 3, 4, 5, 6, 7, 1, 4, 6, 9, 2, 7, 8, 9]
    i = [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
    c = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1]

    model = NonParametricCounting.fit(x, i=i, c=c)
    model.plot()

At present the ``NonParametricCounting`` model does not support fitting with
truncated data.

Let's say this data was for the time, in years,
between repairs on home air conditioners of a specific model. We can then use
this model to estimate the number of reparis we would need on a newly installed
air conditioner. Let's say we wanted to know how many repairs we would expect to see
after 8 years. We can do this by using the ``mcf`` method of the model.

If however, we wanted to know how many reparis were needed after 10 years, we
could not do so since the data only goes up to 9 years. To address this we would
instead need to use a parametric model.

Parametric Recurrent Event Models with Surpyval
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Just as is the case with single event survival analysis, non-parametric models
are not always the best choice. In the case of recurrent events, we can use
parametric models to model the number of events at any time. This is done by
assuming a hazard rate for the inter-arrival times. This also has the same
limitations as per single event survival analysis. That is, given we use a
parametric representation of the hazard rate we are making assumptions about the
shape of the cumulative incidence function. This allows us to extrapolate
above the highest observed values but may not be a good fit to the data.

Let's fit a parametric model.

.. jupyter-execute::

    from surpyval.recurrent import HPP
    import numpy as np

    x = [1, 2, 3, 4, 5, 6, 7, 1, 4, 6, 9, 2, 7, 8, 9]
    i = [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
    c = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1]

    model = HPP.fit(x, i=i, c=c)
    model.plot()

This model is a good fit to the data, althouhg it is just a straight line. But
we can extraplotate above the highest observed value. Let's say we wanted to
know how many events would happen up to "15", we can do this with the ``cif``
method of the model.

.. jupyter-execute::

    model.cif(15)

This means that we would expect to see 7.2 events up to "15" (in whatever units
this model is in). Let's see a different example:

.. jupyter-execute::

    x = [1, 5, 8, 10, 12, 13, 13, 14]
    HPP.fit(x).plot()


This HPP model, in this case, is not a good fit to the data. This is because
the model assumes that the accumulation of events will tend to be a straight
line whereas the data appears to be increasing over time. In this case, we have
made a poor assumption in using the HPP model. Let's try another one.


.. jupyter-execute::

    from surpyval.recurrent import Duane
    x = [1, 5, 8, 10, 12, 13, 13, 14]

    model = Duane.fit(x)
    model.plot()

This is clearly a much better fit. Have a look at the api documentation to see
what other parametric models are available in SurPyval.

Renewal Modelling in SurPyval
-----------------------------

In contrast to the above, where the cumulative count of events are assumed to
have an underlying rate of occurence, renewal models assume that there is an
underlying distribution of the inter-arrival times where each subsequent
inter-arrival time is affected by some restoration factor.

Generalised Renewal Process with SurPyval
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generalized Renewal Process modelling is simple with SurPyval:

.. jupyter-execute::

    from surpyval import Weibull
    from surpyval.recurrent import GeneralizedRenewal, NonParametricCounting
    import numpy as np

    x = np.array([1, 2, 3, 4, 4.5, 5, 5.5, 5.7, 6])

    model = GeneralizedRenewal.fit(x, dist=Weibull)
    model

We cannot plot the cumulative incidence function of the model since it does
not have a closed form solution. We can however plot the cumulative incidence
function of a monte carlo simulation of the model. Let's do that and compare
it to a non-parametric description of the MCF:

.. jupyter-execute::

    np_model = model.count_terminated_simulation(len(x), 5000)
    ax = np_model.plot()
    NonParametricCounting.fit(x).plot(ax=ax)

We have simulated the model we created up to the number of failures we saw in
the data with the ``count_terminated_simulation`` method. This method takes
two arguments, the first is the number of failures to simulate up to and the
second is the number of simulations to run. The more simulations you run the
more accurate the model will be. The method returns a ``NonParametricCounting``
model that can be used to plot the results.

You can see that the cumulative incidence function of the model is a very good
fit to the data. You can also see that it is "wavy." This is because the
underlying distribution is Weibull with a reasonably high shape parameter. This
means that the first inter-arrival time is going to be within a relatively
narrow period. After the first failure, and the subsequent restoration, the
next inter-arrival time is going to be in a larger range since it will be the
sum of the first inter-arrival time and the second inter-arrival time. This
process continues for each subsequent inter-arrival time. Eventually the waves
will become smaller as the mixing of previous inter-arrival times makes the
spread of the next inter-arrival time larger and larger. It looks essentially
like a smooth line at the higher values.

SurPyval uses the Kijima Type i as the default. Let's change this to
Kijima Type ii and see what happens.

.. jupyter-execute::

    from surpyval import Weibull
    from surpyval.recurrent import GeneralizedRenewal, NonParametricCounting
    import numpy as np

    x = np.array([1, 2, 3, 4, 4.5, 5, 5.5, 5.7, 6])

    model = GeneralizedRenewal.fit(x, dist=Weibull, kijima="ii")

    np_model = model.count_terminated_simulation(len(x), 5000)
    ax = np_model.plot()
    NonParametricCounting.fit(x).plot(ax=ax)

We can see that this model is not as good a fit as the kijima type i model.
This implies that the restoration that is done only repairs damage done since
the last event. We could then use this model, via the non-parametric
simulations of it, to estimate the number of events up to a given time.

G1 Renewal Process with SurPyval
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

G1 Modelling can easily be done with SurPyval:

.. jupyter-execute::

    from surpyval import Exponential
    from surpyval.recurrent import GeneralizedOneRenewal
    import numpy as np
    x = np.array([3, 6, 11, 5, 16, 9, 19, 22, 37, 23, 31, 45]).cumsum()

    model = GeneralizedOneRenewal.fit(x, dist=Exponential)
    model


This data is from [1]_ and shows the inter-arrival times, and not the total
time to each event. We therefore need to take the cumulative sum of all the
times before passing it to the ``fit`` method. These are the same results as
achieved by Kaminskiy and Krivtsov in their paper [2]_ introducing the G1
Renewal Process.

Surpyval allows you to use any distribution in SurPyval as the underlying
distribution. Let's use the same data with a Weibull G1 Renewal Process.


.. jupyter-execute::
    :stderr:

    from surpyval import Weibull
    from surpyval.recurrent import GeneralizedOneRenewal, NonParametricCounting
    import numpy as np
    x = np.array([3, 6, 11, 5, 16, 9, 19, 22, 37, 23, 31, 45]).cumsum()

    model = GeneralizedOneRenewal.fit(x, dist=Weibull)
    model

We can see that the restoration factor is quite similar. What is interesting is
that the underlying Weibull distribution has a shape parameter greater than 1.
This indicates that the underlying distribution is not exponential. Since the
G1 Renewal Process does not have a closed form solution for the cif we can
create a non-parametric model from a monte carlo simulation. Let's do this and
compare it to the data MCF.

.. jupyter-execute::

    np_model = model.time_terminated_simulation(250, 1000)
    np_model.plot()
    NonParametricCounting.fit(x).plot()


In this code we created a ``NonParametricCounting`` model using the G1 Models
``time_terminated_simulation`` method. This method takes two arguments, the
first is the time to run the simulation to while the second is the number of
simulations to run. The more simulations you run the more accurate the model
will be. The method returns a ``NonParametricCounting`` model that can be
used to plot the results. We then also add the raw data to the plot for
comparison.

The image above shows that the blue line (the model from the simulation) is in
very good agreement to the data. This is a good indication that the underlying
distribution is Weibull and that the repair effectiveness has been correctly
estimated.

References
----------

.. [1] Basu, A.P. and Rigdon, S.E., 2000. Statistical methods for the reliability of repairable systems. John Wiley & Sons.

.. [2] Kaminskiy, M.P. and Krivtsov, V.V., 2010. G1-renewal process as repairable system model. Reliability: Theory & Applications, 5(3), pp.7-14.